CSV Data
The CSV (Comma-Separated Values) input node in the workflow engine is a component that allows you to read data from CSV files and incorporate that data into a larger workflow process. This node is particularly useful when you need to integrate data from CSV files into automated workflows, data processing pipelines, or other business processes.
Please see the advanced section below if you want to read with multiple concurrent execution paths involved.
Functional overview​
-
Configuration:
- Input Path: Specify the path or location of the CSV file you want to read.
- Delimiter: Choose the character used to separate values in the CSV file (e.g., comma, semicolon, tab).
- Headers: Decide whether the CSV file's first row contains column headers, which will be used to label the data fields.
-
Data Reading:
- The CSV input node reads the specified CSV file and processes its content according to the provided configuration.
-
Output:
- The CSV input node generates structured data from the CSV file content.
- Each row in the CSV file becomes a data record, and each column becomes a data field.
- If headers are enabled, the data fields are labeled with the header names from the first row of the CSV file.
-
Data Integration:
- The structured data output from the CSV input node can be integrated into the broader workflow.
- Other nodes in the workflow can use this data for various purposes such as analysis, processing, or decision-making.
-
Workflow Orchestration:
- The CSV input node is part of a larger workflow orchestrated by the workflow engine.
- It can be connected to other nodes in the workflow, creating a sequential or parallel flow of operations
-
Triggers and Scheduling:
- It is possible to schedule when the CSV input node reads data from the file.
- This can be on-demand or based on specific triggers.
Usage​
Click on the cog wheel in the item properties to edit the configuration of the CSV line reader. You can select which columns to use and how they are mapped.
The CSV reader node is used within a loop. Place an outgoing
edge named 'next' to process the datarecord in the next steps.
Use 'end reached' to continue after the input has been fully processed.
All selected columns of data record are is set to the key/value store with the column name and the records' value.
The node is intended to be used to process inputs like lists of names and e-mail addresses and e.g. send newsletters.
For big data use-cases batched reading is more appropriate and will be supported in a future release.
The optional parameter text field allows to specify a column name prefix resulting in e.g. '[prefix].[column]' key names. It is also used as the name of the reader during its lifetime in the diagram.
Advanced​
The data identifier can be used to refer to the datasource later in the workflow without configuring it again.
For a single thread the next row is returned whenever a workflow visits a datasource action. Another thread would start reading from the beginning unless the same data identifier is used.
If an end or final node is reached the datasource is closed. When a thread synchronizes with a join node the datasource is closed as well. Therefore, if another row from the data is required to be read after one thread exited, it needs to be kept open for the other, this is achieved by using a unique data identifier and the by enabling the 'Open only' checkbox.